1,811 research outputs found

    Deep Ordinal Reinforcement Learning

    Full text link
    Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.Comment: replaced figures for better visibility, added github repository, more details about source of experimental results, updated target value calculation for standard and ordinal Deep Q-Networ

    Water resources management in a homogenizing world: Averting the Growth and Underinvestment trajectory

    Get PDF
    Biotic homogenization, a de facto symptom of a global biodiversity crisis, underscores the urgency of reforming water resources management to focus on the health and viability of ecosystems. Global population and economic growth, coupled with inadequate investment in maintenance of ecological systems, threaten to degrade environmental integrity and ecosystem services that support the global socioeconomic system, indicative of a system governed by the Growth and Underinvestment (G&U) archetype. Water resources management is linked to biotic homogenization and degradation of system integrity through alteration of water systems, ecosystem dynamics, and composition of the biota. Consistent with the G&U archetype, water resources planning primarily treats ecological considerations as exogenous constraints rather than integral, dynamic, and responsive parts of the system. It is essential that the ecological considerations be made objectives of water resources development plans to facilitate the analysis of feedbacks and potential trade-offs between socioeconomic gains and ecological losses. We call for expediting a shift to ecosystem-based management of water resources, which requires a better understanding of the dynamics and links between water resources management actions, ecological side-effects, and associated long-term ramifications for sustainability. To address existing knowledge gaps, models that include dynamics and estimated thresholds for regime shifts or ecosystem degradation need to be developed. Policy levers for implementation of ecosystem-based water resources management include shifting away from growth-oriented supply management, better demand management, increased public awareness, and institutional reform that promotes adaptive and transdisciplinary management approaches

    Learning Best Response Strategies for Agents in Ad Exchanges

    Full text link
    Ad exchanges are widely used in platforms for online display advertising. Autonomous agents operating in these exchanges must learn policies for interacting profitably with a diverse, continually changing, but unknown market. We consider this problem from the perspective of a publisher, strategically interacting with an advertiser through a posted price mechanism. The learning problem for this agent is made difficult by the fact that information is censored, i.e., the publisher knows if an impression is sold but no other quantitative information. We address this problem using the Harsanyi-Bellman Ad Hoc Coordination (HBA) algorithm, which conceptualises this interaction in terms of a Stochastic Bayesian Game and arrives at optimal actions by best responding with respect to probabilistic beliefs maintained over a candidate set of opponent behaviour profiles. We adapt and apply HBA to the censored information setting of ad exchanges. Also, addressing the case of stochastic opponents, we devise a strategy based on a Kaplan-Meier estimator for opponent modelling. We evaluate the proposed method using simulations wherein we show that HBA-KM achieves substantially better competitive ratio and lower variance of return than baselines, including a Q-learning agent and a UCB-based online learning agent, and comparable to the offline optimal algorithm

    Pseudorehearsal in value function approximation

    Full text link
    Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters

    Identifying Critical States by the Action-Based Variance of Expected Return

    Full text link
    The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.Comment: 12 pages, 6 figure

    Adherence and persistence to direct oral anticoagulants in atrial fibrillation: a population-based study

    Get PDF
    Background Despite simpler regimens than vitamin K antagonists (VKAs) for stroke prevention in atrial fibrillation (AF), adherence (taking drugs as prescribed) and persistence (continuation of drugs) to direct oral anticoagulants are suboptimal, yet understudied in electronic health records (EHRs). Objective We investigated (1) time trends at individual and system levels, and (2) the risk factors for and associations between adherence and persistence. Methods In UK primary care EHR (The Health Information Network 2011–2016), we investigated adherence and persistence at 1 year for oral anticoagulants (OACs) in adults with incident AF. Baseline characteristics were analysed by OAC and adherence/persistence status. Risk factors for non-adherence and non-persistence were assessed using Cox and logistic regression. Patterns of adherence and persistence were analysed. Results Among 36 652 individuals with incident AF, cardiovascular comorbidities (median CHA2DS2VASc[Congestive heart failure, Hypertension, Age≥75 years, Diabetes mellitus, Stroke, Vascular disease, Age 65-74 years, Sex category] 3) and polypharmacy (median number of drugs 6) were common. Adherence was 55.2% (95% CI 54.6 to 55.7), 51.2% (95% CI 50.6 to 51.8), 66.5% (95% CI 63.7 to 69.2), 63.1% (95% CI 61.8 to 64.4) and 64.7% (95% CI 63.2 to 66.1) for all OACs, VKA, dabigatran, rivaroxaban and apixaban. One-year persistence was 65.9% (95% CI 65.4 to 66.5), 63.4% (95% CI 62.8 to 64.0), 61.4% (95% CI 58.3 to 64.2), 72.3% (95% CI 70.9 to 73.7) and 78.7% (95% CI 77.1 to 80.1) for all OACs, VKA, dabigatran, rivaroxaban and apixaban. Risk of non-adherence and non-persistence increased over time at individual and system levels. Increasing comorbidity was associated with reduced risk of non-adherence and non-persistence across all OACs. Overall rates of ‘primary non-adherence’ (stopping after first prescription), ‘non-adherent non-persistence’ and ‘persistent adherence’ were 3.5%, 26.5% and 40.2%, differing across OACs. Conclusions Adherence and persistence to OACs are low at 1 year with heterogeneity across drugs and over time at individual and system levels. Better understanding of contributory factors will inform interventions to improve adherence and persistence across OACs in individuals and populations

    Learning from Monte Carlo Rollouts with Opponent Models for Playing Tron

    Get PDF
    This paper describes a novel reinforcement learning system for learning to play the game of Tron. The system combines Q-learning, multi-layer perceptrons, vision grids, opponent modelling, and Monte Carlo rollouts in a novel way. By learning an opponent model, Monte Carlo rollouts can be effectively applied to generate state trajectories for all possible actions from which improved action estimates can be computed. This allows to extend experience replay by making it possible to update the state-action values of all actions in a given game state simultaneously. The results show that the use of experience replay that updates the Q-values of all actions simultaneously strongly outperforms the conventional experience replay that only updates the Q-value of the performed action. The results also show that using short or long rollout horizons during training lead to similar good performances against two fixed opponents

    Novel insights into diminished cardiac reserve in non-obstructive hypertrophic cardiomyopathy from four-dimensional flow cardiac magnetic resonance component analysis

    Get PDF
    Aims: Hypertrophic cardiomyopathy (HCM) is characterized by hypercontractility and diastolic dysfunction, which alter blood flow haemodynamics and are linked with increased risk of adverse clinical events. Four-dimensional flow cardiac magnetic resonance (4D-flow CMR) enables comprehensive characterization of ventricular blood flow patterns. We characterized flow component changes in non-obstructive HCM and assessed their relationship with phenotypic severity and sudden cardiac death (SCD) risk. Methods and results: Fifty-one participants (37 non-obstructive HCM and 14 matched controls) underwent 4D-flow CMR. Left-ventricular (LV) end-diastolic volume was separated into four components: direct flow (blood transiting the ventricle within one cycle), retained inflow (blood entering the ventricle and retained for one cycle), delayed ejection flow (retained ventricular blood ejected during systole), and residual volume (ventricular blood retained for >two cycles). Flow component distribution and component end-diastolic kinetic energy/mL were estimated. HCM patients demonstrated greater direct flow proportions compared with controls (47.9 ± 9% vs. 39.4 ± 6%, P = 0.002), with reduction in other components. Direct flow proportions correlated with LV mass index (r = 0.40, P = 0.004), end-diastolic volume index (r = −0.40, P = 0.017), and SCD risk (r = 0.34, P = 0.039). In contrast to controls, in HCM, stroke volume decreased with increasing direct flow proportions, indicating diminished volumetric reserve. There was no difference in component end-diastolic kinetic energy/mL. Conclusion: Non-obstructive HCM possesses a distinctive flow component distribution pattern characterised by greater direct flow proportions, and direct flow-stroke volume uncoupling indicative of diminished cardiac reserve. The correlation of direct flow proportion with phenotypic severity and SCD risk highlight its potential as a novel and sensitive haemodynamic measure of cardiovascular risk in HCM
    • …
    corecore